\section{Performance Evaluation}\label{performanceEvaluation}
In this section, we conduct extensive simulations to evaluate the performance of the proposed protocols in a large scale RFID system that consists of multiple tag categories. As discussed in Section~\ref{discussionMultiReaders}, the running of our protocols in multi-reader scenarios is almost the same as that in the single reader case. Hence, following most literature on the topic of RFID cardinality estimation \cite{AlexMobicom12Everybit} \cite{xiulongICNP2014} \cite{PETZheng} \cite{qian2011cardinality}, we (1) simulate a single reader in all experiments and assume this reader has enough communication range to probe all RFID tags; (2) consider an error-free communication channel. 

In the following, we first conduct simulations to evaluate the effectiveness of our batch-splitting strategy by comparing our SEM+AP with SEM. Secondly, we conduct extensive simulations to evaluate the actual reliability of SEM+AP. Finally, we compare SEM+AP with existing estimation protocols to evaluate its time-efficiency. We run each simulation 1000 times and report the detailed experimental results in the following.

\subsection{Evaluating the Batch Splitting Strategy}
This section aims at investigating the effectiveness of our adaptive batch-splitting strategy. We simulate 10 tag categories in a RFID system. In the first set of simulations, the cardinalities of 10 tag categories are \emph{balanced}. Specifically, each category has the same cardinality of 5000 tags. In this case, we conduct simulations with different accuracy requirement and different settings of $\ell$. Simulation results in Fig.~\ref{balancedPerformance} reveal that the running time of SEM and SEM+AP are almost the same in such a \emph{balanced} system. As aforementioned, the underlying reason is that we can find a proper initial frame length $f$ fitting all categories well. As a result, we do not need to split \emph{on-the-fly-list} into multiple batches. Then, SEM+AP works the same as SEM in such a balanced RFID system.
\begin{figure*}[htb]
%\setlength{\belowcaptionskip}{-0.1cm}
\centerline{\includegraphics[scale=0.57]{Fig/balancedPerformance}}
\vspace{-0.1in}
\caption{Comparing SEM+AP with SEM in the scenario with \emph{balanced} cardinality distribution. Each category has the same size of 5000 tags.}
\label{balancedPerformance}
\end{figure*}

In the second set of simulations, the cardinalities of 10 tag categories vary from 1000 to 50000, and follows \emph{exponential} distribution. The simulation results shown in Fig.~\ref{ExpUnbalancedPerformance} reveal that SEM+AP outperforms SEM by reducing nearly 50\% of the execution time. For example, as illustrated in Fig.~\ref{ExpUnbalancedPerformance}(a), the execution time of SEM ($\ell=0$) is around 6.3 seconds. And the execution time of SEM+AP ($\ell=0$) is around just 3.2 seconds, which represents about 50\% improvement. The simulation results with other parameter settings also reveal the persistent superiority of SEM+AP over SEM.
\begin{figure*}[htb]
%\setlength{\belowcaptionskip}{-0.1cm}
\centerline{\includegraphics[scale=0.57]{Fig/ExpUnbalancedPerformance}}
\vspace{-0.1in}
\caption{Comparing SEM+AP with SEM in the scenario with \emph{unbalanced} cardinality distribution. The cardinalities of 10 categories are of exponential distribution.}
\label{ExpUnbalancedPerformance}
\end{figure*}


In the third set of simulations, the cardinalities of 10 tag categories also vary from 1000 to 50000, but follows \emph{linear} distribution. The simulation results shown in Fig.~\ref{LinearUnbalancePerformance} also reveal that SEM+AP has 50\% improvement compared with SEM. We conclude that in an unbalanced RFID system (no matter with exp. distribution or linear distribution), SEM+AP persistently outperforms SEM. The underlying reason is that SEM+AP can adaptively split an unbalanced batch into multiple balanced batches, and can find a proper initial frame size $f$ for each batch. 
\begin{figure*}[htb]
%\setlength{\belowcaptionskip}{-0.1cm}
\centerline{\includegraphics[scale=0.57]{Fig/LinearUnbalancePerformance}}
\vspace{-0.1in}
\caption{ComparingSEM+AP with SEM in the scenario with \emph{unbalanced} cardinality distribution. The cardinalities of 10 categories are of linear distribution.}
\label{LinearUnbalancePerformance}
\end{figure*}

\subsection{Evaluating the Actual Reliability}
The estimation accuracy is one of the most important metrics in this paper. Therefore, extensive simulations are conducted to evaluate the \emph{actual reliability} of SEM+AP. Recall that each simulation is independently repeated 1000 times. For category $C_i$, we refer to the estimation results within $[(1-\alpha)n_i, (1+\alpha)n_i]$ as \emph{successful} estimation, where $n_i$ is the actual cardinality of category $C_i$. The so called actual reliability is defined as the number of successful estimations divided by 1000. The simulation results in Fig.~\ref{AccuracyBalancedDistri} present the actual reliability of SEM+AP in a balanced RFID system, where the cardinalities of different categories are as Fig.~\ref{balancedPerformance}(a). It is easy to observe that the actual reliability of each category is higher than the required reliability~$\beta$.

We also conduct simulations in \emph{unbalanced} (i.e., \emph{exp.} or \emph{linear} distribution) RFID systems. The tag cardinality distribution of these two types of unbalanced scenarios are the same as Fig.~\ref{ExpUnbalancedPerformance}(a) and Fig.~\ref{LinearUnbalancePerformance}(a), respectively. The simulation results in Figs.~\ref{AccuracyExpDistri} and \ref{AccuracyLinearDistri} reveal that SEM+AP ($\ell=0$) sometimes cannot satisfy the required reliability. This phenomenon is the so called \emph{premature termination}, which has been discussed in Section~\ref{PrematureTer}. Fortunately, we find that the actual reliability of SEM+AP ($\ell=1$) is higher than SEM+AP ($\ell=0$), and can always satisfy the required reliability $\beta$. This demonstrates that our $\ell$-sigma method can effectively avoid the premature termination.
\begin{figure}[htb]
%\setlength{\belowcaptionskip}{-0.1cm}
\centerline{\includegraphics[scale=0.17]{Fig/AccuracyBalancedDistri}}
\vspace{-0.1in}
\caption{Evaluating the actual reliability of SEM+AP in the scenario with \emph{balanced category distribution} (See Fig.~\ref{balancedPerformance}(a)).}
\label{AccuracyBalancedDistri}
\end{figure}
\begin{figure}[htb]
%\setlength{\belowcaptionskip}{-0.1cm}
\centerline{\includegraphics[scale=0.17]{Fig/AccuracyExpDistri}}
\vspace{-0.1in}
\caption{Evaluating the actual reliability of SEM+AP in the scenario with \emph{unbalanced (exponential) category distribution} (See Fig.~\ref{ExpUnbalancedPerformance}(a)).}
\label{AccuracyExpDistri}
\end{figure}
\begin{figure}[htb]
%\setlength{\belowcaptionskip}{-0.1cm}
\centerline{\includegraphics[scale=0.17]{Fig/AccuracyLinearDistri}}
\vspace{-0.1in}
\caption{Evaluating the actual reliability of SEM+AP in the scenario with \emph{unbalanced (linear) category distribution} (See Fig.~\ref{LinearUnbalancePerformance}(a)).}
\label{AccuracyLinearDistri}
\end{figure}
\vspace{-0.1in}
\subsection{Comparison with Existing Protocols}
Besides the estimation reliability, another important metric that needs to be considered is time-efficiency. We compare SEM+AP with five excellent estimation protocols: Maximum Likelihood Estimator (MLE) \cite{li2010energy}, Enhanced Zero Based estimator (EZB) \cite{AnonymousTracking}, Unified Probabilistic Estimator (UPE) \cite{kodialam2006fast}, Average
Run based Tag estimation (ART)~\cite{AlexMobicom12Everybit}, and Simple RFID Counting Protocol for Single-Set (SRCs)~\cite{ChenMobicom}. We could use these estimation protocols to separately estimate the cardinality of each category one by one. Xie \emph{et al.} proposed an Ensemble Sampling estimator (ES) to simultaneously estimate the cardinalities of the top-$k$ largest categories. For comprehensive comparison, we also treat ES as one of the benchmark~protocols.


In the simulations corresponding to Fig.~\ref{balancedTimeEfficiency}(b), we vary the number of tag categories from 2 to 50. For a fixed number of categories, each category randomly picks a cardinality according to the probability distribution in Fig.~\ref{balancedTimeEfficiency}(a). For example, the probability corresponding to 10000 is 0.25, which means that an arbitrary category has a 25\% likelihood of being assigned a cardinality of 10000 when simulating the RFID system. Since the picked cardinalities are within a \emph{relatively small} range $[8000,12000]$, all tag categories will have similar cardinalities. This can be interpreted as a \emph{balanced} scenario. The simulation results in  Fig.~\ref{balancedTimeEfficiency}(b) demonstrate that our SEM+AP significantly outperforms all benchmark protocols. For example, when there are 50 categories in the system, the execution time of the fastest benchmark protocol, i.e., SCRs, is 29 seconds. And the execution time of our SEM+AP protocol is just 3.2 seconds, which is nearly 10 times faster than SCRs. On the other hand, as illustrated in Fig.~\ref{UnbalancedTimeEfficiency}(a), each category uniform-randomly picks a cardinality from a \emph{wide range} of $[1000,20000]$. Clearly, this will generate an \emph{unbalanced} RFID system. We make two important observations:~(1)~The execution time of the benchmark protocols is almost the same as that in Fig.~\ref{balancedTimeEfficiency}(b). The underlying reason is explained as follows. The execution time of separate estimation protocols \cite{AlexMobicom12Everybit} \cite{li2010energy} \cite{AnonymousTracking} \cite{kodialam2006fast} \cite{ChenMobicom} only depends on the required accuracy, which is independent from the tag cardinality. Therefore, no matter the type of cardinality distribution, the total execution is proportional of the category number. (2) Our SEM+AP protocol persistently runs several times faster than all the other benchmark protocols in the scenario with unbalanced categories.


\begin{figure}[htb]
%\setlength{\belowcaptionskip}{-0.1cm}
\centerline{\includegraphics[scale=0.17]{Fig/balancedTimeEfficiency}}
\vspace{-0.1in}
\caption{Evaluating the time-efficiency of SEM+AP ($\ell=1$) and other benchmark protocols in \emph{relatively balanced} RFID systems.}
\label{balancedTimeEfficiency}
\end{figure}
\begin{figure}[htb]
%\setlength{\belowcaptionskip}{-0.1cm}
\centerline{\includegraphics[scale=0.17]{Fig/UnbalancedTimeEfficiency}}
\vspace{-0.1in}
\caption{Evaluating the time-efficiency of SEM+AP ($\ell=1$) and other benchmark protocols in \emph{unbalanced} RFID systems.}
\label{UnbalancedTimeEfficiency}
\end{figure}



%\begin{figure*}
%\begin{center}
%\begin{minipage}[c]{0.5\textwidth}
%\centering
%\includegraphics[scale=0.15]{Fig/AccuracyBalancedDistri.eps}
%\centering
%\caption{}
%\label{fig:levfig}
%\end{minipage}%
%\begin{minipage}[c]{0.5\textwidth}
%\centering
%\includegraphics[scale=0.15]{Fig/AccuracyExpDistri.eps}
%\caption{}
%\label{fig:levfig}
%\end{minipage}
%\begin{minipage}[c]{0.5\textwidth}
%\centering
%\includegraphics[scale=0.15]{Fig/AccuracyLinearDistri.eps}
%\caption{}
%\label{fig:levfig}
%\end{minipage}
%\end{center}
%\end{figure*}

%
%\begin{figure*}
%\begin{minipage}[t]{1\linewidth}
%\centering
%\includegraphics[scale=0.18]{Fig/balancedTimeEfficiency.eps}
%\vspace{-2em}
%\caption{Average bandwidth utilization, $U_j=1000$}
%\label{fig:chap3:Energy560}
%\end{minipage}
%\begin{minipage}[t]{1\linewidth}
%\centering
%\includegraphics[scale=0.18]{Fig/UnbalancedTimeEfficiency.eps}
%\vspace{-2em}
%\caption{Average user experience degree, $U_j=1000$}
%\label{fig:chap3:Energy580}
%\end{minipage}
%\end{figure*}





